Amiga Format CD 46

home *** CD-ROM | disk | FTP | other *** search

/ Amiga Format CD 46 / Amiga Format CD46 (1999-10-20)(Future Publishing)(GB)[!][issue 1999-12].iso / -serious- / comms / www / urlx / urlx.readme < prev next >

Wrap

Text File | 1999-09-06 | 2KB | 66 lines

Short: V1.0 Extract URL's from any file +sort++ Uploader: frans@xfilesystem.freeserve.co.uk (francis swift) Author: frans@xfilesystem.freeserve.co.uk (francis swift) Type: comm/www URL: www.xfilesystem.freeserve.co.uk Some quick'n'nasty hacks, but I've included the source for you to look at, especially as urlx uses btree routines and there aren't that many simple examples of using btrees. urlx ---- This program searches a file for url's (http:// etc) and prints them or outputs them to a file. Internally it stores them in a btree to allow duplicates to be eliminated and optionally to allow the output to be sorted. There are two sorts available, -s selects a simple alphabetic sort, and -u to select a special url sort that should provide better grouping of similar site names (basically it sorts first url element in groups backwards). The output can be either straight text or by selecting -h in html format for making quick bookmark files. By default any parameters after the url are ignored, but they can be kept by the use of -p. You can also select to output just one type of file by selecting the extension using -.ext, for example to show only .jpg url's you would use -.jpg, and for .html you would use -.htm (which matches both .htm and .html). A better solution for this last case is to use the -i flag which selects not only .html extensions but also paths where a default html would be expected. Basically there are lots of options but you'll probably just end up using: urlx -u infile outfile which uses the special url sort, or urlx -u -h infile outfile.html for making a bookmark file. treecat ------- This is just a quick hack to let shell (sh/pdksh) users grab url's from a complete directory tree. urlx accepts a single dash as meaning input is from stdin, so you can use something like treecat dh0:Voyager/cache | urlx -u - outfile to produce a file containing every url in every file in your voyager cache. You can use this on any browser cache tree. scanv ----- This is used specifically to pick out the url's from the headers on the files in a voyager cache. This is the url of the file itself, the program doesn't look in the file contents for any other url's, use treecat|urlx for that. urlv ---- This is used specifically to grab url's from a Voyager history file, usually called URL-History.1. urla ---- This is used specifically to grab url's from an AWeb cache index file, usually called AWCR.